Meta-automated machine learning with improved multi-armed bandit algorithm for selecting and tuning a machine learning algorithm

ABSTRACT

A method for automatically selecting a machine learning algorithm and tuning hyperparameters of the machine learning algorithm includes receiving a dataset and a machine learning task from a user. Execution of a plurality of instantiations of different automated machine learning frameworks on the machine learning task are controlled each as a separate arm in consideration of available computational resources and time budget, whereby, during the execution by the separate arms, a plurality of machine learning models are trained and performance scores of the plurality of trained models are computed. One or more of the plurality of trained models are selected for the machine learning task based on the performance scores.

CROSS-REFERENCE TO PRIOR APPLICATION

Priority is claimed to U.S. Provisional Patent Application No.62/962,223 filed on Jan. 17, 2020, the entire contents of which ishereby incorporated by reference herein.

FIELD

The present invention relates to machine learning (ML), and inparticular a method and system for meta-automated ML which uses amulti-armed bandit algorithm for selecting and tuning an ML algorithm.

BACKGROUND

When applying ML, several high-level decisions have to be taken. Forexample, a learning algorithm, or base learner, needs to be selectedfrom a plethora of different available learning algorithms. Eachlearning algorithm comes with a different set of hyperparameters thatcan be optimized to maximize the algorithm's performance concerning anapplication-specific error metric for a given dataset. Also, differentfeature preprocessing algorithms and feature selection techniques, eachwith their set of hyperparameters, can be combined into an ML pipelineto improve the base learner's performance. Accordingly, differenthyperparameters need to be tuned and different data preprocessing andfeature engineering techniques may be applied. Automated machinelearning (AutoML) investigates the automation of selecting base learnersand preprocessors as well as tuning the associated hyperparameters.

First, AutoML is motivated by the aim of allowing non-experts toleverage ML. Second, it is also motivated by making the process ofapplying ML more efficient, e.g., by using automation to lower theworkload of expert data scientists. Third, AutoML is desired to providea principled approach for applying base learners to ML problems (see,e.g., Mischa Schmidt, et al., “On the Performance of DifferentialEvolution for Hyperparameter Tuning,” arXiv;1904.06960v1, (Apr. 15,2019), which is hereby incorporated by reference herein). Anh Truong, etal. “Towards Automated Machine Learning: Evaluation and Comparison ofAutoML Approaches and Tools,” arXiv:1908.05557v2, (Sep. 3, 2019), whichis hereby incorporated by reference herein, describe the potential ofAutoML to reduce repetitive tasks in ML pipelines and thereby boostproductivity for data scientists, ML engineers and ML researchers usinga number of different tools and platforms which attempt to automaterepetitive tasks.

For automating traditional ML several open source software frameworksexist, for example, as listed in Schmidt, et al., “On the Performance ofDifferential Evolution for Hyperparameter Tuning,” arXiv;1904.06960v1(Apr. 15, 2019), which is hereby incorporated by reference herein. Theassociated scientific studies usually document the feasibility and MLperformance on a range of well-known test datasets provided, forexample, in the openML community. Frameworks such as these usuallyattempt to find, for a user's ML task, the most suitable ML algorithmwith the best performing hyperparameter settings (and training theselected and parametrized algorithm on the task's data). This isreferred to as algorithm selection and hyperparameter tuning. Further,the frameworks train the selected and parametrized algorithm on the dataof the ML task. It is noted that the publications mentioned above do notdescribe how a layman user can easily invoke AutoML, but requireprogramming proficiency.

Dedicated to deep learning, the topic of neural architecture search(NAS) is addressed in a number of publications. The frameworks describedin these publications focus on devising architectures of neural networksby parametrizing deep learning libraries such as keras (keras.io) ortensorflow.

Recently, Micah J. Smith, et al., “The Machine Learning Bazaar:Harnessing the ML Ecosystem for Effective System Development,”arXiv:1905.08942v3 (Nov. 12, 2019), which is hereby incorporated byreference herein, describe an AutoML framework referred to as AutoBazaarwhich bases the concept of ML pipeline templates by leveraging variousdifferent existing ML and data manipulation libraries. The pipelinetemplate is AutoBazaar's means of abstraction for algorithm selectionand hyperparameter tuning. For that, AutoBazaar describes an approach(algorithm 2) that iterates over the consecutive steps of selectingalgorithms (actually, pipeline selection among many possible candidatepipeline variants), tuning the according hyperparameters (of the variousalgorithms/steps entailed in the selected pipeline) and training thepipeline (including the ML algorithm inside the pipeline).

U.S. Patent Application Publication No. 2016/0132787 describe a cloudAutoML concept as follows: users define data runs, or tasks, and enterthem into a database. One of potentially many worker nodes (in a cloudsetting) identifies, via a selection strategy, the so-called“hyperpartition” for which to tune hyperparameters. During tuning,models are already trained, and tested on the given dataset in order tocompute performance scores based on a performance function for theuser-specified task (e.g. the well-known Mean Squared Error (MSE)metric). Selection strategies can either be uniform at random, orbuilding on the so far reached performance scores a standard multi-armedbandit algorithm (called UCB1), or one of two variants of themulti-armed bandit algorithm that are capable with drifting rewardsdenoted ‘BestK’ and ‘BestKVelocity’. The hyperpartition is defined asthe choice of categorical hyperparameters, for example which algorithmto run. To tune hyperparameters, the commonly known Bayesianoptimization via Gaussian processes is applied. By applying theselection strategy and then Bayesian optimization, the worker identifieswhat training job (ML algorithm/pipeline) is to be applied to thedataset next and enters a corresponding training job description into acentral database. When a worker node is available (i.e., idle), it willcheck the central database, work on one of the potentially many trainingjobs in that database and mark the training job as started to preventother workers to work on the same job. When ML is complete, the workerwill store performance and the associated model and check for a nexttraining job, or attempt to create a new training job for one of thespecified data runs.

There are also AutoML as a service (AMLaaS) offerings such as GOOGLECloud AutoML focusing on deep learning approaches, MICROSOFT AzureMLleveraging AZURE ML algorithms, SALESFORCE TransmogrifAI and UBERLudwig. The internal operations of these mechanisms are however notdisclosed, thus it is not public knowledge how these scale their AMLaaSoperations to millions of user requests. Also, it is not known ordisclosed how these mechanisms would be able to leverage and matchexisting ML or deep learning algorithms.

SUMMARY

In an embodiment, the present invention provides a method forautomatically selecting a machine learning algorithm and tuninghyperparameters of the machine learning algorithm. A dataset and amachine learning task are received from a user. Execution of a pluralityof instantiations of different automated machine learning frameworks onthe machine learning task are controlled each as a separate arm inconsideration of available computational resources and time budget,whereby, during the execution by the separate arms, a plurality ofmachine learning models are trained and performance scores of theplurality of trained models are computed. One or more of the pluralityof trained models are selected for the machine learning task based onthe performance scores.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the present invention will be described in even greaterdetail below based on the exemplary figures. The present invention isnot limited to the exemplary embodiments. All features described and/orillustrated herein can be used alone or combined in differentcombinations in embodiments of the present invention. The features andadvantages of various embodiments of the present invention will becomeapparent by reading the following detailed description with reference tothe attached drawings which illustrate the following:

FIG. 1 schematically illustrates an embodiment of a system and methodfor meta-AutoML according to the present invention, which is referred toherein as Hierarchical Automated Machine LEarning with Time-awareness(HAMLET) due to the hierarchical decision making and the ability toaccount for the progress of time;

FIG. 2 illustrates docker options;

FIG. 3 is an exemplary graph showing how a learning curve (LC) isextrapolated based on observed learning curve values according to anembodiment of the present invention;

FIGS. 4 a, 4 b and 4 c respectively show a first experiment as boxplotsof HAMLET Variant 1 ranks with budgets of 15 minutes (FIG. 4 a ), 30minutes (FIG. 4 b ) and 1 hour (FIG. 4 c ), and that, with smallerbudgets, the results do not change qualitatively;

FIGS. 5 a, 5 b, 5 c and 5 d respectively show the first experiment asboxplots of HAMLET Variant 3 ranks with budgets of 10 minutes (FIG. 5 a), 15 minutes (FIG. 5 b ), 30 minutes (FIG. 5 c ) and 1 hour (FIG. 5 d), and that, with smaller budgets, the results do not changequalitatively, (Budget 10 minutes;)

FIGS. 6 a, 6 b and 6 c respectively show the first experiment asboxplots of ranks for inter-policy comparisons with budgets of 15minutes (FIG. 6 a ), 30 minutes (FIG. 6 b ) and 1 hour (FIG. 6 c ), andthat for B=900s the policies are statistically indistinguishable;

FIGS. 7 a, 7 b and 7 c respectively show a second experiment as boxplotsof HAMLET Variant 1 ranks with budgets of 2 hours (FIG. 7 a ), 3 hours(FIG. 7 b ) and 12 hours (FIG. 7 c );

FIGS. 8 a, 8 b, 8 c and 8 d respectively show the second experiment asboxplots of HAMLET Variant 3 ranks with budgets of 1 hour (FIG. 8 a ), 2hours (FIG. 8 b ), 3 hours (FIG. 8 c ) and 12 hours (FIG. 8 d );

FIGS. 9 a, 9 b, 9 c, 9 d, 9 e, 9 f, 9 g and 9 h respectively show thesecond experiment as boxplots of ranks for inter-policy comparisons withbudgets of 15 minutes (FIG. 9 a ), 30 minutes (FIG. 9 b ), 45 minutes(FIG. 9 c ), 1 hour (FIG. 9 d ), 2 hours (FIG. 9 e ), 3 hours (FIG. 9 f), 6 hours (FIG. 9 g ) and 12 hours (FIG. 9 h ); and

FIG. 10 shows intervals of 95% confidence of the different policies'mean ranks, aggregated over all datasets and budgets.

DETAILED DESCRIPTION

Embodiments of the present invention provide a system and method formeta-AutoML that leverages on existing frameworks and libraries foralgorithm selection and hyperparameter tuning, as well as ensembling.The method and system use a modified multi-armed bandit algorithmenhanced with learning curve extrapolation to predict each arm'sperformance. The system is designed for convenient operability.Embodiments of the present invention are designed favorable so as tointegrate NAS frameworks in a conceptually similar way as the AutoMLframeworks for traditional ML algorithms.

While it is possible to use multi-armed bandit algorithms such as UCB1for learning to select optimally among multiple choices, typically thesealgorithms assume stationary reward distributions (in essence, theexpected rewards received by the multi-armed bandit algorithm for thedifferent arm should not change over time). In AutoML settings, however,the rewards are non-stationary as the arms, such as the hyperpartitionsin U.S. Patent Application Publication No. 2016/0132787 or the AutoMLframeworks in embodiments of the present invention, become better themore computation time they are awarded (i.e., the more they are pulled).U.S. Patent Application Publication No. 2016/0132787 addresses that viathe variants Best-K (reflecting only a subset of the receivedrewards—the size of the subset is identified via the configurationparameter K) or Best-K-Velocity (reflecting the difference of therewards in the subset of K best rewards). However, these multi-armedbandit algorithms only look in the past and thus do not properlyrepresent the problem.

With respect to providing AMLaaS and coordination of workload, the knownapproaches except for the approach described in U.S. Patent ApplicationPublication No. 2016/0132787 do not concern themselves with how to scaleto provide AMLaaS, e.g., using cloud computing. In U.S. PatentApplication Publication No. 2016/0132787, a large-scale distributedarchitecture is used and it is compatible with cloud services. In thispublication, workers pull work from a central database and work meanseither running one ML job (training an ML algorithm as parametrized) orfetching a data-run to decide a hyperpartition and hyperparametrizationto enter into the database for other workers to fetch/work on. Theapproach of U.S. Patent Application Publication No. 2016/0132787requires that jobs or data-runs are assigned priorities and workers thenselect central database entries, for example taking into account thesepriorities. The inventors have recognized that this is a disadvantage asit requires user interaction and knowledge in terms of assigningpriorities. Further, the approach of U.S. Patent Application PublicationNo. 2016/0132787 has certain characteristics/implications that aredetrimental to the AutoML problem. The following holds for all learningstrategies in U.S. Patent Application Publication No. 2016/0132787(i.e., not the uniform at random strategy which is suboptimal itself, asit cannot exploit observations made during training): the approachchooses for a data run a hyperpartition using a selection strategy,specifically a multi-armed bandit algorithm. Multi-armed banditalgorithms such as UCB1, Best-K or Best-K Velocity as described in U.S.Patent Application Publication No. 2016/0132787 will select based onobserved experiences (performance scores of already trained models ofthe different possible hyperpartitions). When the multi-armed banditalgorithm observes a new experience in the database (i.e., a workerfinished the job and stored the model along with its performance in thedatabase), it will update its statistics. The way U.S. PatentApplication Publication No. 2016/0132787 specifies the algorithms, thealgorithms can only observe a new experience when a worker finishestraining a model for the data run in question. After a hyperpartition ischosen, a tuning strategy is applied. That tuning strategy is Bayesianoptimization and updates its probabilistic models on experiences of itsassociated hyperpartition that have already been observed. These modelsare then used to identify the most promising hyperparameterconfiguration, for example using the well-defined expected improvementcriterion. The Bayesian optimization is only after choosing ahyperpartition and is separate from the multi-armed bandit framework andcannot be used beforehand to predict performance of individual arms.Conventionally, there is a single-most promising hyperparameterconfiguration that the worker then requests execution for via thecentral database. In summary, the mechanisms of U.S. Patent ApplicationPublication No. 2016/0132787 do not explain how multiple jobs for asingle data run be requested and executed (as the models need to beupdated).

Embodiments of the present invention can leverage different frameworks.AutoBazaar, auto-sklearn, the framework of U.S. Patent ApplicationPublication No. 2016/0132787, etc. require tuning the ML algorithms'parameters and do not leverage different AutoML-frameworks. In otherwords, known approaches only considers different ML algorithms and theirhyperparameters, as opposed to considering different AutoML-frameworksor algorithms to do algorithm selection and hyperparameter tuning. Theinventors have discovered that this can lead to disadvantages for thefollowing reasons: AutoML is still a field of active research, new ideasand open-source frameworks are published frequently, where no clearcutting-edge framework exists (see Anh Truong, et al. “Towards AutomatedMachine Learning: Evaluation and Comparison of AutoML Approaches andTools,” arXiv:1908.05557v2, (Sep. 3, 2019)). Additionally, for differenttypes of datasets and problems, different algorithms for hyperparametertuning and algorithm selection can lead to the best results (see, e.g.,Mischa Schmidt, et al., “On the Performance of Differential Evolutionfor Hyperparameter Tuning,” arXiv;1904.06960v1, (Apr. 15, 2019)).Further, for example, running Gaussian processes for tuninghyperparameters for neural networks is still a field of active research.If new algorithms are found or new frameworks are published, theabove-mentioned previous approaches cannot easily integrate them totheir system. They would need to solve the problems for their particularframework setting and implement it by themselves to fit theirframeworks' needs, for example they would need to solve how the Bayesianoptimization approach will work efficiently for NAS. This means, newfindings in research cannot easily be integrated, which leads to delaysin improvement.

Additionally, it is not seen how approaches such as in U.S. PatentApplication Publication No. 2016/0132787 could integrate AMLaaSframeworks such as GOOGLE AutoML. This can be a disadvantage, as forexample integrating GOOGLE AutoML might lead to better results in termsof score as open-source frameworks.

Further, in the above-mentioned previous approaches, programming skillsare needed to work with the AutoML frameworks and no easy-to-useapplication programming interface (API) exists. This decreases the easeof use for layman users (who build an important target audience forAutoML).

Embodiments of the present invention overcome the above-described issuesand problems from the previous approaches.

First, with respect to the issue concerning non-stationary rewards,embodiments of the present invention provide an extrapolatingmulti-armed bandit algorithm, which means looking into the future byfitting learning curve functions to the arms' past rewards andextrapolating them until the end of the remaining time budget.Therefore, embodiments of the present invention provides improvedinsight into how to assign time budget among the alternatives as timeprogresses. The approach of working on learning curves as they evolveover runtime is also very beneficial in another aspect when compared tothe “function invocation”-based evaluation in U.S. Patent ApplicationPublication No. 2016/0132787 (and other works) in which the mechanismupdates all statistics based on the model performance recorded in thecentral database, and these performance statistics are updated based onmodel performance after evaluating the trained model. That implies thatbefore being able to update statistics, model training (or modeltrainings if parallel trainings are performed) has to finish as theperformance statistics are based on the unit of “function evaluations”(where function refers to training a parametrized algorithm on thedataset and recording its performance). This is owing to the fact thatthe multi-armed bandit algorithms and Bayesian optimization in U.S.Patent Application Publication No. 2016/0132787 work on past samples(i.e. only after they see new data (a newly trained model's performance)they can update their predictions and get to better predictions). Byleveraging learning curve extrapolation over time in embodiments of thepresent invention, sampling can occur every x seconds and updatepredictions for the Multi-Armed Bandit's arms' learning curves can beupdated. As long as no new best model is reported with its score, theperformance of the arm simply stays constant in performance while timeprogresses (and budget is reduced). This also means that embodiments ofthe present invention can at any time stop execution of an arm andassign execution rights to another if that other arm is predicted toperform favorably based on the learning curve. In contrast, the approachof U.S. Patent Application Publication No. 2016/0132787 would have towait until a model has finished execution in order to see changes in thearms' evaluations, and thus change the choice of the multi-armed banditalgorithm (or the Bayesian optimization), which may take a long time.

Second, with respect to the issues with providing AMLaaS andcoordination of workload, embodiments of the present invention aredesigned to scale from single computer to cloud settings easily, asdescribed below, and thus overcomes the issue of scaling to provideoperation in the AMLaaS setting. The difference of coordinating AutoMLworkload between the approach of U.S. Patent Application Publication No.2016/0132787 and embodiments of the present invention is self-evident:in the approach of U.S. Patent Application Publication No. 2016/0132787,workers pull work from a central database and work means either runningone machine learning job (training a machine learning algorithm asparametrized) or fetching a data-run to decide a hyperpartition andhyperparametrization to enter into the database for other workers tofetch/work on. In contrast, embodiments of the present invention use adispatcher-master-worker concept, in which a master is assigned an MLtask by the dispatcher and the master assigns a time budget to theworkers that the dispatcher collaborates with, wherein multiple workerscan run in parallel if desired. This is beneficial to control resourceson a per AutoML task (referred to as a “data-run” in U.S. PatentApplication Publication No. 2016/0132787). The approach of U.S. PatentApplication Publication No. 2016/0132787 requires that jobs or data-runsare assigned priorities and workers then select central databaseentries, for example taking into account these priorities.

Third, with respect to leveraging different frameworks and in contrastto the above-mentioned previous approaches, embodiments of the presentinvention are able to leverage different AutoML frameworks (which runtheir internal algorithm selection and parameter tuning logics), whichcan be chosen among by means of the improved multi-armed banditalgorithm. The approach according to embodiments of the presentinvention, while conceptually more simple, is desirable as it canovercome disadvantages discussed above. Additionally, embodiments of thepresent invention can integrate AMLaaS frameworks, such as GOOGLEAutoML, to lead to better scores.

Fourth, with respect to providing ease of use for layman users,embodiments of the present invention overcome the issues discussed aboveby providing an easy-to-use API. Therefore, embodiments of the presentinvention provide for easier and greater access to ML.

In an embodiment, the present invention provides a method forautomatically selecting a machine learning algorithm and tuninghyperparameters of the machine learning algorithm. A dataset and amachine learning task are received from a user. Execution of a pluralityof instantiations of different automated machine learning frameworks onthe machine learning task are controlled each as a separate arm inconsideration of available computational resources and time budget,whereby, during the execution by the separate arms, a plurality ofmachine learning models are trained and performance scores of theplurality of trained models are computed. One or more of the pluralityof trained models are selected for the machine learning task based onthe performance scores.

In an embodiment, the performance scores are extrapolated for aremainder of the time budget based on achieved performances ofrespective ones of the arms during a time interval of the executionwhich is a portion of the time budget.

In an embodiment, the method further comprises assigning thecomputational resources to the arms during the remainder of the timebudget based on the extrapolated performance scores.

In an embodiment, the performance scores are extrapolated by fitting alearning curve function to past rewards of the respective ones of thearms and extrapolating the past rewards until an end of the remainder ofthe time budget.

In an embodiment, the method further comprises freezing the execution ofat least one of the arms based on the extrapolated performance scores.

In an embodiment, the method further comprises resuming the execution ofthe at least one of the arms from a point at which the freezingoccurred.

In an embodiment, at least some of the arms are executed by timemultiplexing using a selection mechanism to allocate the computationalresources to the arms during the time budget.

In an embodiment, at least some of the arms are executed in parallel.

In an embodiment, the method further comprises building an ensemble fromthe plurality of trained models.

In an embodiment, each of the arms are executed as a microservicecomponent of a cloud computer system architecture in a docker containerwhich has a container image for a respective one of the automatedmachine learning frameworks.

In an embodiment, the docker containers are contained within a largerdocker container, which contains separate docker containers forcomponents which control the execution of the arms.

In an embodiment, the method further comprises constructing a learningcurve for each of the arms during a time interval of the executionwithin the time budget, extrapolating performance scores of each of thearms until a remainder of the time budget, and freezing or disablingexecution of at least some of the arms based on the extrapolatedperformance scores.

In an embodiment, the learning curves are constructed based on maximumperformance scores achieved by respective ones of the arms during thetime interval.

In another embodiment, the present invention provides a microservicecomponent encapsulated in a docker container of a cloud computing systemarchitecture comprising one or more processors which, alone or incombination, are configured to provide for execution of a methodcomprising: controlling execution of a plurality of instantiations ofdifferent automated machine learning frameworks on a machine learningtask each as a separate arm in consideration of available computationalresources and time budget, whereby, during the execution by the separatearms, a plurality of machine learning models are trained and performancescores of the plurality of trained models are computed.

In a further embodiment, the present invention provides a tangible,non-transitory computer-readable medium having instructions thereonwhich, upon being executed by one or more processors, alone or incombination, provide for execution of a method according to anembodiment of the present invention.

Embodiments of the present invention provide a meta-AutoML system andmethod which is referred to herein as HAMLET. HAMLET leverages existingAutoML frameworks (which in turn leverage existing ML libraries) tosolve a user's task. HAMLET supports parallel execution of differentusers' tasks and supports deployment in different settings. HAMLETadditionally comes with an easy-to-use API in order to support usage bylayman users. Further, HAMLET can manage given time budget limitations(and hardware limitations) by using a special multi-armed banditalgorithm.

HAMLET can be provided as an AutoML platform with the followingcharacteristics in order to find the best possible model for a given MLtask:

-   -   automates algorithm selection and hyperparameter tuning for a        very wide range of algorithms (by integration of different        frameworks), for different types of ML tasks,    -   can be used by multiple (possibly layman) users on different        hardware settings,    -   includes time budget management,    -   eases the access to ML by an easy-to-use API

Table 1 below shows terms and definitions used in herein to describeHAMLET.

TABLE 1 Term Definition, description, explanation Ensemble A set ofmachine learning models to be used jointly for prediction tasks EnsembleLearning A domain of machine learning that focuses on the learning ofensembles of baseline models and their aggregation functions HAMLET taskThe high-level machine learning problem specification consisting e.g. ofthe dataset to be learned upon, the type of learning problem regression,classification or clustering and the loss function can be specified,also a stopping criterion can be specified Input variables Thefields/variables within the dataset that the machine learning modelshould use to predict the target variable during the prediction phase.Sometimes also called “explanatory variables” in the machine learningliterature. Loss function The error function to be used for penalizingprediction errors during the training process. AutoML tries to optimizefor this loss function. Machine Learning A configured and trainedinstance of a machine Model, or “Model” learning pipeline applied to aparticular training dataset. During solving a task, many models aretrained. Stopping criterion A user specified criterion telling HAMLETwhen to stop AutoML. This can be in the form of a time budget, a minimumrequired performance threshold (related to the loss function) for theuser's application, or lack of improvement (e.g. a period of time ornumber of main loop iterations) during which HAMLET's tuners did notimprove in performance anymore Target variables The fields/variableswithin the dataset that the machine learning model should predict duringprediction phase based on the input variables. The target variables needto be present (only) during training phase. May not be present indataset if task is of type unsupervised, e.g. clustering Time Budget Atime budget specified by the user for applying AutoML for the specifiedtask. This can be in the form of a wallclock time budget (e.g. hours),or in terms of a total compute resource time budget (e.g. CPU hours)

FIG. 1 depicts the HAMLET system architecture. In a beneficialembodiment, the interfaces HAMLET user-Dispatcher, Dispatcher-HAMLETMasterBandit, and HAMLET MasterBandit-HAMLET arm are realized as HTTPinterfaces hosted, e.g., in a standard web framework such as flask forpython. The interfaces to the depicted data storage components (databasetables or separate databases) are based on standard approaches, e.g.,based on structured query language (SQL). It is possible to merge orseparate data storage or replace the depicted databases with, e.g., afile system based approach such as the hadoop distributed file system(HDFS). These realization choices affect the technical realization ofinterfaces C1-C5, E1-E5 and F1-F3.

The dispatcher component is the point of contact for the HAMLET user torequest the HAMLET service. Therefore, the dispatcher's interface A maybe used by the user to:

-   -   upload a dataset and define a dataset description,    -   provide a machine learning task description (e.g. performance        function, budget, possibly which AutoML frameworks and        configurations to use),    -   request starting of a configured task,    -   receive information on the progress of a specified machine        learning task, e.g., performance scores and elapsed budget,    -   optionally receive a notification when the training finished,    -   receive an indication which was/were the best performing        model(s) or ensemble(s);    -   optionally receive the trained models as, e.g., serialized        binary objects in a standardized format such as pickled python        objects, and    -   receive references to the models within HAMLET such as unique        identifiers of the models within a HAMLET database.

The dispatcher registers tasks with the MasterBandit and invokes it tostart training. For that, the dispatcher passes relevant configurationparameters (e.g., dataset reference task description, armconfigurations) via interface B.

A HAMLET arm is an instantiation of an AutoML framework/algorithm forhyperparameter tuning and algorithm selection—this can, e.g., be aninstantiation of a certain framework, or of a framework with a certainuser defined configuration. Usually multiple HAMLET arms exist. TheHAMLET MasterBandit controls execution of the HAMLET arms based ondifferent decisions rules (different embodiments). It also manages thebudget left for the task and checks whether requirements (regardingscore to be reached) are met. Interface D carries the interactions amongMasterBandit and the arms—most notably:

-   -   reporting performance scores and trained models,    -   starting, stopping, freezing and continuing execution of arms        (and the AutoML processes within them), and    -   sharing configuration information.

In the following, the components in a more detailed way.

Dispatcher Component

The dispatcher receives a machine learning task referring a dataset, adescription of the dataset, a loss function, a stopping criterion andthe type of machine learning problem the task is about (regression,classification or clustering). The user also specifies to the dispatcherwhich variables are input and which are output variables in the dataset.Optionally, HAMLET can deduce which type of data format is contained ineach column automatically by applying programmatic heuristics from bestpractices obvious to someone skilled in the art.

In one embodiment, the type of machine learning can be inferred from thedataset directly. If all target variables indicated in the datasetdescription are of the categorical type, it is considered aclassification problem. On the other hand, if the target variablescontain floating point numbers, it is a regression problem. If not anytarget variable is contained in the dataset, it is of the clusteringtype.

Upon receiving the dataset, the HAMLET dispatcher stores it in a datasetstorage (e.g. a database table) and assigns a unique dataset identifierreturned to the user, and associates it to the user's task identifier.Upon receiving the dataset description, the HAMLET dispatcher stores itin a dataset description storage (e.g. a database table), assigns it aunique dataset description identifier, and associates it to the datasetidentifier. Upon receiving the task specification, the HAMLET dispatcherstores it in a task storage (e.g. a database table) and assigns a uniquetask identifier.

In different embodiments, the following features are provided:

-   -   In a particular embodiment, the interface A for the user to        specify the task, upload the dataset, and specify the        description of the dataset is a representational state transfer        (REST)-based interface that carries the task and dataset        descriptions in standard formats such as extensible markup        language (XML) or JavaScript object notation (JSON) and allows        dataset upload via standard file upload mechanisms.    -   In a particularly relevant embodiment, the dataset is a tabular        dataset stored in a comma separated value (csv) file format. The        dataset is uploaded via interface A via standard file upload        mechanisms known from internet services.    -   In a beneficial embodiment, HAMLET offers the user to specify        which of the supported arms to consider for solving the task.        Moreover, the user may also specify concrete parametrizations of        the components within the arms (e.g., exclusions of algorithms,        or certain hyperparameter ranges) to HAMLET.    -   In a beneficial embodiment, when presented a new user task,        HAMLET can choose to disable arms that do not fit the task or        recommend arms which are especially promising. HAMLET uses        features describing the task's dataset, e.g., the size, the data        types present in the dataset, if there are missing features,        etc. Also, the type of machine learning task (e.g. regression vs        clustering) is meaningful to consider for the decision about        which arms to apply. To identify meaningful arms to propose,        approaches known from the state of the art for ‘Meta-Learning’        can be applied.    -   In a beneficial embodiment, the user has to provide        authentication credentials prior to be able to use HAMLET.        Deployment

In the following, it is described how the HAMLET design allows scalingfrom a single computer to a distributed cloud setting with a massivenumber of parallel users and tasks.

As task solving may take a considerable amount of time, and as manyparallel tasks may be requested by different users, it is necessary tobe able to scale the system capacity. It is beneficial that HAMLET isdesigned to be compatible with standard mechanisms for scaling up webservices in cloud service settings and can be employed to support manyconcurrent user requests.

A favorable embodiment of HAMLET encapsulates the MasterBandit as amicroservice component, e.g., in a docker image. In this favorableembodiment, a cloud orchestrator component routes requests from thedispatcher to the MasterBandit. For example, Kubernetes can be used tomanage overall cloud system resources and within these, many instancesof MasterBandit containers. Similarly, a beneficial embodiment realizesthe HAMLET arm component as a microservice as a docker container. Eacharm is a separate container. This way, the HAMLET MasterBanditmicroservice can be realized in one of two options:

-   -   1. If its own container environment offers a docker server        instance, it can run arms inside its own container. This option        shares virtualized cloud resources assigned to the MasterBandit        container among the MasterBandit's main loop (see below) and the        different Arms' tuning calculations (see below), or    -   2. Alternatively, if the cloud environment in which HAMLET is        deployed offers a docker server, the MasterBandit can request        starting, freezing, stopping and termination of arm containers        as needed in its main loop. In this way, the command set of        interface D for controlling execution of the arms can be        replaced by docker container execution control commands.        Therefore, interface D becomes simpler.

FIG. 2 depicts two general options for deploying HAMLET as examples. Theletters 1 and N denote the usual cardinality of relationships amongcomponents: there are multiple MasterBandits MB associated to a singledispatcher D, and multiple arms A (depending on the task configuration)to a single MasterBandit MB. Requests among components may run viadocker orchestration components, such as Kubernetes, or use directreference to instantiated docker containers as provided by a dockerserver component. A system may host multiple dispatchers, instantiatedon user request by a Kubernetes orchestrator, e.g., for load-balancingpurposes, leveraging standard mechanisms defined in the state of theart.

-   -   In option 1, each of the microservice components is encapsulated        in an individual docker container, but all docker containers are        contained within a larger “outside” docker container. This might        be a suitable deployment for a single PC or small server.    -   In option 2, the dispatcher, MasterBandit, and arm components        are contained in individual docker containers.

In another embodiment, all microservice components (dispatcher,MasterBandit, arm) can be encapsulated in operating system (OS)processes or even computing threads instead of relying on, e.g., dockeras a virtualization technique, with the associated standard mechanismsof execution runtime control. In this embodiment, freezing of armexecution may be realized via interface D as described above, oralternatively standard operating system process control commands areused to freeze and/or resume execution of arms.

A particularly favorable embodiment relies in a mixture of bothencapsulations: the dispatcher microservice component is encapsulated asa docker image. Another docker image bundles the MasterBandit and thenecessary framework libraries for realizing the arms as a process insidethe same docker container. This way, the execution control commands(starting, stopping, freezing and resume) of arms in interface D isrealized by standard OS process control mechanisms which simplifiesimplementation of interface D. In a favorable deployment, the dispatchercan then request execution of a MasterBandit in response to a userrequest via, e.g., a docker orchestrator such as Kubernetes.

In a favorable deployment, the microservice component dispatcher can (inresponse to a user's request) request the instantiation (and management)of a MasterBandit container from a docker server, or an orchestratorsuch as Kubernetes.

In a favorable deployment, the microservice component MasterBandit canrequest instantiation (and management) of a suitable number of armcontainers from a docker server to solve a user's HAMLET task.

In an embodiment, different arm container images exist for differentframeworks and are instantiated based on the HAMLET task specification.In this embodiment, the MasterBandit has to indicate the exact armcontainer type in its request to the docker server. The MasterBandit mayalso pass framework-specific configuration information pertaining to theHAMLET task (e.g., hyperparameter ranges to use or algorithms toconsider).

In another embodiment, a general purpose arm container is configuredwith all possible frameworks supported by HAMLET and available forstarting in the docker server. In this variant, the MasterBandit simplyrequests starting of the desired number of arm containers of the dockerserver and passes to the containers the necessary configuration toindividualize their behavior, e.g., to behave as an auto-sklearn arm inaddition to framework-specific configuration information pertaining tothe HAMLET task to (e.g., hyperparameter ranges to use or algorithms toconsider) for that arm.

MasterBandit and Arm Component

Different existing frameworks for AutoML (e.g., auto-sklearn, or PMF,NAS (e.g., auto-keras) or single-purpose machine learning tuningalgorithms (e.g., the differential evolution-based approach described inMischa Schmidt, et al., “On the Performance of Differential Evolutionfor Hyperparameter Tuning,” arXiv;1904.06960v1, (Apr. 15, 2019), or theBayesian optimization approaches for tuning a hyperpartition) areintegrated as choices, or arms into the HAMLET MasterBandit. TheMasterBandit can select from those for solving a particular user task.The different choices may or may not be applicable to any given usertask. HAMLET arms can also integrate remote AutoML frameworks such asGoogle Cloud AutoML via the remote framework's client libraries (ifprovided) as tuners. Depending on the functionality of the remoteframework, some of the below embodiments of execution are not possible,e.g., the GOOGLE AutoML framework does not support freezing of executionof training.

For each task, a single MasterBandit component interacts with one ormore arms, each abstracting from libraries for AutoML and NAS asmentioned above and Cloud AMLaaS services such as GOOGLE AutoML byintegrating their client libraries, or certain customized versions ofthose external libraries. While running, the arms execute the librarieson the user task. During execution, these libraries train many machinelearning models and record associated scores. Performance scores oftrained models and the models themselves are stored in the databases inFIG. 1 .

During execution of an arm, new found models are continually reportedand stored as soon as found, together with the reached scores and thetraining time (time needed to find these models). The HAMLETMasterBandit can request the scores and training times for each arm ateach iteration.

The logic to control the execution of solving the specified task residesin the HAMLET MasterBandit component in FIG. 1 . The MasterBanditdecides on resources to be used for the arms, i.e., which arm to run inwhich time interval. The MasterBandit can pause/resume the running ofarms.

In a beneficial embodiment, the MasterBandit executes arms for aconfigured or user specified time budget in parallel. This setting isbeneficial if vast compute resources are available for solving theuser's task. In this setting, the MasterBandit does not need to selectamong the different arms for execution. In this embodiment, theMasterBandit may monitor execution (training of models from all arms) ina main loop or at specified time intervals for the time until the task'sassociated stopping criterion is met. In a beneficial variant, theMasterBandit may apply learning curve extrapolation (LCE) as describedbelow for the purpose of presenting diagnosing information to the user(e.g., the predicted performances of the different arms).

In another embodiment, the MasterBandit component multiplexes differentarms using compute resources in time via a suitable selection mechanism.For this, the MasterBandit decides the sequence in which HAMLET arms mayrun. In this setting, it is beneficial if the MasterBandit appliesmachine learning to increase the performance of its decisions. To do so,the MasterBandit executes in a main loop for the time until the task'sassociated stopping criterion is met. In a variation, the MasterBanditallows a subset of arms to execute in parallel.

In a particularly beneficial embodiment denoted LCE, the HAMLETMasterBandit uses a novel and inventive improvement of a multi-armedbandit algorithm that extrapolates, during solving the user's task andfor all arms, the expected performance to the end of the task'sremaining time budget. The extrapolation is based on the arms' achievedbest performances over their individual execution time within the task.HAMLET selects the one with the highest extrapolated performance. Inthis embodiment, only the different arms' maximum achieved performancesover time are considered in the extrapolation.

In a variant of the LCE embodiment (see algorithm 1 below), when HAMLETtakes the decision to execute an arm, it assigns a time interval (theinterval can be a configuration parameter), executes the arm for thetime interval, freezes execution of the arm via mechanisms pertaining tothe deployment, e.g., provided by the arm microservice (e.g. viainterface D), the virtualization environment encapsulating the armcontainer, or, e.g., process control. The MasterBandit checks theperformance scores of the trained models, updates the learning algorithm(e.g. the Multi-Armed Bandit's statistics and associated extrapolationcurves) to inform the next loop iteration's selection step. If in alater iteration the same arm is chosen again, the arm's execution can beresumed and directly start where it was frozen before, thus no computingtime is lost. This approach is particularly beneficial as theMasterBandit can flexibly assign and re-assign arm execution while anarm has not increased its performance score yet, but as time advancedwhile executing the arm, the corresponding learning curve can be updatedto consider if the compute resources should be re-allocated.

In a specific embodiment, a multi-armed bandit algorithm such as thewell-known UCB1 algorithm (e.g. provided via the BTB library asreferenced in Micah J. Smith, et al., “The Machine Learning Bazaar:Harnessing the ML Ecosystem for Effective System Development,”arXiv:1905.08942v3 (Nov. 12, 2019)) can be used to select which arm toexecute and learn, based on the achieved performance by the selectedarms. In another embodiment, Best-K or Best-K-Velocity variants of themulti-armed bandit algorithm can be used in the MasterBandit (e.g.provided via the BTB library as referenced in Micah J. Smith, et al.,“The Machine Learning Bazaar: Harnessing the ML Ecosystem for EffectiveSystem Development,” arXiv:1905.08942v3 (Nov. 12, 2019)).

In another embodiment, when HAMLET takes the decision to execute an arm,it waits for it completing training of a pre-configured number of models(e.g., one). HAMLET checks the performance scores of the trained models,updates the learning algorithm (e.g., the multi-armed bandit algorithm'sUCB1 statistics or associated extrapolation curves) to inform the nextloop iteration's selection step. This approach can make use of LCE ifthe arms' learning curves are reported as best performance over thenumber of models trained by the different arms (effectivelyreinterpreting model training as a unit of time instead of, e.g.,wallclock time or CPU time).

In an embodiment, the MasterBandit uses the arctan function to fit andextrapolate the learning curve.

In an embodiment, the MasterBandit uses a neural network to fit andextrapolate the learning curve. In a particular favorable embodiment,the neural network for learning curve extrapolation has been pre-trainedon exemplary learning curves from machine learning problems anddatasets.

In a particular embodiment, the MasterBandit considers only the convexhull of the arms' recorded performance scores, i.e., their best scoresas reported, to construct the arms' respective learning curves. In abeneficial embodiment, the MasterBandit fills the time intervals betweenthe arms' recorded scores by creating artificial performance scoresamples for the timestamps between the arms' recorded scores.

One particular approach to the MasterBandit's main loop is described inthe following algorithms:

Algorithm 1:  # model persistence and statistics persistence is skippedfor simplicity  # of exposition  #  # The MasterBandit loops until thetask's constraint specified by the  # user is met.  # This constraintcould be a time budget. In the first iteration, all  # arms are triedonce for giving the MasterBandit a first set of  # performance scores. # After the first iteration, the MasterBandit  #   updates the arms'learning curves using the arms' scores  #  extrapolates the arm'slearning curves until under the  #    assumption all remaining budgetwas dedicated to  #    the individual arm under consideration  #  TheMaster chooses among the learning curves the most favorable  #    armand executes it for a configured time interval.  #    Stochasticexploration can be used by adjusting the  #    epsilon parameters inAlgorithm 1.2  #  deduct time interval from budget remaining, check ifconstraints  #  are met.  #  If not met, repeat loop, otherwise stoptraining. While ConstraintMet == False:  Iteration k == 1: Each arm isrun once for a predefined amount of Budget  For each iteration k > 1:  For each arm a:    Update LC_a(t) = [x_a = (t1, . . . ,tnow), y_a =(score_t1, . . . ,score_tnow)]    Extrapolated_LC_a=UpdateExtrapolatedLearningCurve(LC_a(t),                          Budget_remaining)    Next_arms, Next_Budget =MasterChooseArm(Extrapolated_LC_a for each                  arm a,Budget_remaining, Desired_Score)  Pause all arms not in Next_arms forNext_Budget seconds  Resume all arms in Next_arms for Next_Budgetseconds  ConstraintMet = CheckConstraint(Budget_remaining,Desired_Score) With t: time  score: scores reached by best models  LC:Learning_Curve  k: iteration  a: arm  LC_a(t): Learning Curve for arm a,depending on time t up until time tnow  x_a: time (training time forrelated score, seconds arm has been running up         until resp. scorewas found, for arm a)  y_a: scores (scores that have been found by arma, monotonically increasing)  Next_arms: List of arms which where chosenby MasterBandit to run in the next         iteration  Next_Budget:Budget, seconds, for the next iteration, chosen by MasterBandit

Algorithm 1.1 UpdateExtrapolatedLearningCurve(LC_a(t),Budget_remaining):  # Learning Curve for arm a is extrapolated:Extrapolated_LC  # Extrapolated_LC estimates the future scores over timeup until the  # maximum available Budget (Budget_remaining).  # a curveis fit, using standard regression algorithms, based on  # scoresobserved so far (y) over time (t) with standard regression  # algorithmswe mean e.g. fitting a curve using an ilog or  # arctan function (forclassification tasks) or using e.g. SVM  X = LC_a(t).x_a  y =LC_a(t).y_a  Extrapolated_LC = curve_fit(X, y)  Return Extrapolated_LC

Algorithm 1.2 MasterChooseArm(Extrapolated_LC_a for each arm a,Budget_remaining, Desired_Score):  # The Master chooses an arm based onobjective: It can either choose  # the arm which is expected to have auser defined desired accuracy  # first (op =1), or it can choose the armwhich is expected to have  # the highest score in the remaining amountof time (op =2).  # The following shows a MasterBandit version, wheredue to very limited  # resources only one arm per iteration should berun (it can be  # adapted to run multiple arms per iteration inparallel).  # Via epsilon parameter, stochastic exploration behavior iscontrolled  If op == 1   With probability epsilon1:    Next_arm =argmin_a(Extrapolated_LC_a.x where                Extrapolated_LC_a.y >Desired_Score)   With probability (1− epsilon1):    Next_arm = random If op == 2   With probability epsilon1:    Next_arm =argmax_a(Extrapolated_LC_a.y where                Extrapolated_LC_a.x ==t_max_available)   With probability (1− epsilon1):    Next_arm = random # Assign Budget for next Iteration based on Remaining Budget, Desired # Accuracy, and the Extrapolated LC for each arm:  Next_Budget =AssignBudgetArm (Budget_remaining, Desired_Score, Extrapolated_LC_a) Return Next_arm, Next_Budget With  x: time (training time for relatedscore, seconds arm has been running up until      resp. score was found)    y: scores (scores that have been found by arm, monotonicallyincreasing)

Algorithm 1.3 CheckConstraint(Budget_remaining, Desired_Score):  # Checkwhether all Budget is used and whether the Desired accuracy  # isalready reached  Constraint_met = False  if Budget_used >=Budget_remaining:   Constraint_met = True  ifHighest_Score_Reached_above_all_arms >= Desired_Score:   Constraint_met= True  Return Contraint_met

In a beneficial embodiment, HAMLET tracks and stores models' wallclockexecution times on a certain portion of the dataset (e.g. the test setcommonly used to calculate the performance scores) in addition to themodels' performance scores. This beneficially allows later to presentthe user not only the highest performing models, but also the fastest toexecute. Also, this may inform ensemble building as indicated below.

When operating on the user's task, the HAMLET MasterBandit storestrained models, ensembles (see below), and achieves performancestatistics in corresponding databases as indicated in FIG. 1 .Alternatively, the arms store trained models and associated performancestatistics in the databases themselves during executing instead ofleaving storage to the MasterBandit.

On user request, the HAMLET dispatcher provides access to these storedmodels and statistics. In a beneficial variant, HAMLET may be configuredto only retain a certain percentage or a certain number of the topperforming models. This configuration may be provided by the user alongwith the task definition, or it can be a HAMLET configuration parameter.

In an embodiment, the MasterBandit can trigger the dispatcher to notifythe user, e.g., via an email address associated to the user orassociated to the user's task (provided then as part of the taskdefinition), that the user's task is finished.

In an embodiment, the MasterBandit is configured by a human systemadministrator with the permissible computing resource usage. TheMasterBandit can then select the appropriate mode (running all arms inparallel, or multiplexing in time).

In another beneficial embodiment, the MasterBandit can be informed bythe deployment orchestration technology (e.g., MasterBandit queries theorchestrator, or the orchestrator notifies the MasterBandit) or asuitable cloud system load information service about the permissiblecomputing resource usage. The MasterBandit can then switch between themode of running all arms in parallel, and the mode of multiplexing intime.

In a beneficial embodiment, HAMLET can be provided as a Meta-AutoMLcloud-based system.

HAMLET with Ensembling

In a beneficial embodiment, HAMLET also offers to build ensembles basedon all or a subset of the task's trained models. For this, modelscreated form different tuners (and frameworks) can be combined inensembles. Specifically, HAMLET may select a preconfigured number orproportion of the top performing models. In another variant, HAMLET mayselect the fastest to execute models. In yet another embodiment, HAMLETmay select models based on the types of machine learning algorithms, soas to diversify the algorithms constituting the ensemble. Threedifferent embodiments to schedule the ensembling are presented:

-   -   In one embodiment, the MasterBandit executes model ensembling        calculations after the task's stopping criterion is met.    -   In another embodiment, during the main loop, the MasterBandit        executes at specified intervals ensembling calculations, e.g.        every 5 iterations or every 20 seconds.    -   In the third embodiment, during the main loop, HAMLET treats        model ensembling calculations as a special form of tuner, i.e.,        it is treated as one of the MasterBandit's arms to select from        with consequences and implications as described above.

In a particular embodiment, HAMLET uses the MetaBags concept forensembling calculations (see Jihed Khiari, et al., “MetaBags: BaggedMeta-Decision Trees for Regression,” which is incorporated by referenceherein). For this, HAMLET creates different bootstrap samples of thetask's dataset to calculate MetaBags specific meta-features. In anotherembodiment, HAMLET uses model averaging (averaging all constituentmodels' predictions) as a means for ensembling.

HAMLET with Access to Massive Parallelism, Walltime Budget

In another embodiment, HAMLET is deployed in a cloud setting with anabundance of computing resources available to solve a user's task, withonly a total wallclock time budget specified. In this setting, theMasterBandit can run all its arms to be used in the specified task inparallel as described above for the budget given. The frameworks in thearms however may face inherent challenges to scale satisfactorily to,e.g., a large number of CPUs. Therefore, the HAMLET MasterBandit canchoose to:

-   -   Modify the configuration of each arm such that the respective        arm focuses on a limited search space, e.g. by limiting the        number of algorithms to consider in the arm, and/or by reducing        the range of values for specific hyperparameters to consider in        the arm's framework, and    -   Make up for the limitation in the modified arms by adding        complementary arms that are configured to cover the algorithms        and/or hyperparameter ranges that the modification removed from        the modified arms' configuration scopes.        HAMLET for Particular Use Cases

A particular embodiment of HAMLET can be advantageously used for SmartCities. A set of cameras monitoring pedestrian traffic feeds into ademographics detection engine such as FieldAnalyst by NEC CORP.FieldAnalyst generates an anonymized breakdown of demographic statisticsof pedestrians (gender, age bracket) over time as they pass through themonitored area. This data, together with weather data, and availabledata on scheduled events in the vicinity of the monitored area, servesas input to HAMLET to learn predicting future pedestrian trafficdemographics. Upon solving the task, HAMLET produces a set of modelswhich are able to predict with a certain accuracy the upcomingpedestrian demographics. Such predictions then can constitute the basefor dynamic traffic control decisions (e.g., to lower traffic in case ofpredictions), or inform marketing for shops to, e.g., prepare theirofferings for upcoming crowds of pedestrians.

In another embodiment of HAMLET for Smart Cities, a set of camerasmonitoring road traffic feeds into a flow analytics engine that can,e.g., detect number of cars, large cars (trucks, buses, etc.) and bikesin a time interval. This traffic flow demographics data, together withweather data, and other data on scheduled events in the city, serves asinput to HAMLET to learn predicting future traffic flow demographics,e.g., the share of e.g. trucks vs cars vs bikes. Upon solving the task,HAMLET produces a set of models which are able to predict with a certainaccuracy the upcoming traffic flow demographics. Such predictions thencan constitute the base for dynamic traffic control decisions (e.g., tolower traffic in case of predictions), for decisions to, e.g., dispatchadditional traffic police for managing traffic, or for decisions on,e.g., requests of rerouting of certain share of drivers via an interfaceto a navigation system provider.

An embodiment of HAMLET can be advantageously used for predictivecontrol in Smart Buildings for energy optimization. A Smart Building isequipped with sensors measuring room temperatures that are accessiblevia an Internet-of-Things (IoT) infrastructure platform, e.g., viaFIWARE. Alternatively, or in addition, the building's hydroponic heatingsystem operation status (on, off, temperatures) is accessible, e.g.,from its building management system. This sensor data, together withrelevant weather data can be provided to HAMLET to identify a highlyaccurate predictive machine learning model, e.g., for several hours oreven days in advance (especially when using weather forecast services,e.g., from the internet). In a beneficial variant, HAMLET can also beapplied to energy meter readings to identify a machine learning modelable to predict how the heating system's operational settings and theweather influence relate to heating system's energy use. Thesepredictive models (e.g., room temperature prediction and energyconsumption) can then be used by an optimization algorithm such as agenetic algorithm, differential evolution or particle swarm optimizationto evaluate under which heating system settings the building will meetor violate building-specific target room temperature ranges, and how tooptimize heating system energy usage.

An embodiment of HAMLET can be advantageously used by hospitals forpredicting patient discharge. For hospital management, it can beadvantageous to know when a patient is likely to be discharged. This canbe achieved by encoding patients' health data, physiologicalmeasurements (e.g., heart rate and blood pressure) and general patientinformation (e.g., age, gender) in numeric values by methods known fromthe state of the art, and providing this numeric data together with thenumber of days the patients remain staying in a hospital to HAMLET.HAMLET will efficiently produce predictive machine learning models thatare able to predict for new and already admitted patients how long therespective patient will stay in hospital. This discharge information canthen be used, e.g., for the hospital's resource planning.

An embodiment of HAMLET can be advantageously used for quantitativetrading. Informing trading decisions of investors can be achieved byproviding securities' fundamental data and their time series of tradingprices to HAMLET to identify machine learning models that a) classifysecurities that should be bought or sold, and/or b) predict futuresecurity prices (e.g., the closing price of a stock in a week from whenthe prediction was executed).

An embodiment of HAMLET can be advantageously used for the e-healthdomain. Medical data is given, and this data can serve as input toHAMLET to learn classifications of sickness based on the data. Anexample is as follows: data which has been collected at hospitals toanalyze factors of diabetes (e.g., age, time in hospital, medication,other sicknesses, etc.). The task is to classify the risk of diabetespatient p of having to be readmitted to the hospital for an emergencydiabetes case based on the data given. Upon solving the task, HAMLETproduces a set of models which are able to classify with a certainaccuracy the emergency-submittance risk of a patient. Suchclassifications can then be used to support doctors in their decisionmaking and provide hints and signs for particular treatment. This canalso be applied to other sickness cases.

An embodiment of HAMLET can be advantageously used for air qualitypredictions. A set of sensors monitoring road traffic and weather, aswell as air quality (e.g., levels of SO₂ or microscopic particulatematter) in a time interval. This serves as input to HAMLET to learnpredicting future air quality, e.g., the levels of SO₂ or microscopicparticulate matter. Upon solving the task, HAMLET produces a set ofmodels which are able to predict with a certain accuracy the upcomingair quality values. Such predictions then can constitute the base fordynamic traffic control decisions (e.g., to lower traffic in case ofpredictions), or for decisions on public transport. Another use of suchpredictions is to build end user apps informing the general public oftimes when certain areas of interest are predicted to be highlypolluted.

In general and independent of use case, embodiments of the presentinvention provide the following improvements/advantages:

1) Design of a meta-AutoML method that can easily integrate new AutoMLframeworks (for NAS, or for traditional ML) as arms as they aredeveloped in the state of the art, including a novel and inventiveimproved multi-armed bandit algorithm to choose among differentapplicable AutoML frameworks (arms) by learning and extrapolating theirperformance for efficient computation resource usage by timemultiplexing, which avoids the drawback of multi-armed bandit algorithmswhich base their calculations on statistics of past performances.2) Design of a system which is scalable from a single PC to massivelyparallel cloud deployments on different hardware settings, which canalso cope with running frameworks (arms) in parallel or multiplex theirexecution in time by using a flexible system architecture which, e.g.,encapsulates the bandit as a microservice component who manages thedeployment of arm microservice components (with budget awareness).3) Provision of an enhanced multi-armed bandit algorithm with LEC whichimproves assignment of computing resources within time budget toframework execution in scenarios of scarce computation resources. Betterscores can be found when limited resources are given or the desiredscores can be found faster.4) Support of deep learning, as well as traditional ML. Also supportsdifferent types of learning problems (regression, classification,clustering). The ability to use more algorithms leads to better scores.5) Provides built-in ensembling of models across many differentframeworks. This leads to better scores as models from across frameworksare used (more good models incorporated results in better scores).

In an embodiment, a method for automatically selecting, tuning andtraining machine learning algorithms for user specified machine learningtasks comprises the steps of:

1) Receiving a dataset from a user.

2) Receiving a particular machine learning task from a user.

3) Controlling execution of multiple instantiations of AutoML frameworksand/or algorithms (arms) according to available compute resources andtime budget, using the learning and extrapolation of performance forremaining task time budget, with the goal of finding trained models forthe particular task.4) Collecting a multitude of trained models for the task for later userretrieval, as well as finding the highest scoring model for theparticular task.

In comparison to a user invoking training of a task on open sourceframeworks integrated as arms in HAMLET on its own, and not using themulti-armed bandit algorithm for resource management, embodiments of thepresent invention provide a number of improvements. In the former case,the user could have to store and compare resulting models byhim/herself. This would implicate following drawbacks:

-   -   Decisions on what frameworks can be used for a given dataset        need to be made by user (the MetaAutoML effect of this step is        lost), and    -   Parallel deployment of frameworks needs more computation power        than integration with the HAMLET multi-armed bandit algorithm,        which focuses on deployment of promising (by extrapolation)        frameworks.    -   Lower scores for same amount of resources and/or more time        and/or hardware resources needed for same score.    -   The user could choose only one framework and only invoke        training for a given problem on one framework (via framework        API). This would likely result in the best models (highest        score) not being found or used.

In the following, embodiments of HAMLET are described along withexperimental results demonstrating the improvements in performance andadvantages provided thereby. For example, providing for automatedalgorithm selection and hyperparameter tuning facilitates theapplication of machine learning. HAMLET outperforms traditionalmulti-armed bandit strategies which look to the history of observedrewards to identify the most promising arms for optimizing expectedtotal reward in the long run. The inventors have recognized that, whenconsidering limited time budgets and computational resources, thesetraditional strategies apply a backward view of rewards which isinappropriate as the bandit does not look into the future foranticipating the highest final reward at the end of a specified timebudget. In contrast, HAMLET extends the bandit approach with learningcurve extrapolation and computation time awareness for selecting among aset of machine learning algorithms. In experiments with 99 recordedhyperparameter tuning traces from prior work, all studied HAMLETvariations exhibit equal or better performance than other bandit-basedalgorithm selection strategies. The best performing HAMLET variantcombines learning curve extrapolation with the well-known upperconfidence bound exploration bonus. In total, that variant achieves alower mean rank than all non-HAMLET policies with statisticalsignificance at the 95% level.

HAMLET allows to select the base learner to be applied to a dataset. Inan embodiment, an iterative approach is modeled to select the baselearner and the optimization of its hyperparameters as a hierarchicalproblem. A multi-armed bandit focuses on selecting the base learner, anda specialized component (also referred to as the tuner) is responsiblefor tuning that respective base learner's hyperparameters. This approachis easily extensible with base learners by integrating them asadditional arms. HAMLET applies to realistic settings where AutoML facesa limitation of resources such as available computational power and thetime available to solve the machine learning problem. The followingdiscussion addresses the extreme case of a single CPU available forsolving a machine learning task within a strict wallclock time budget.In this setting, the traditional multi-armed bandit's approach is notoptimal as it requires to observe a complete function evaluation, i.e.,training a parametrized base leaner on the dataset, for being able toupdate the associated arm's statistics. Additionally, most multi-armedbandit algorithms assume stationary reward distributions, which usuallyis not true as tuning algorithms increase in achieved performance overtime. Finally, the typical AutoML setting would like to achieve themaximum possible performance, not maximize the average reward sums overseveral repeated trials. HAMLET provides an improved multi-armed banditapproach by accounting for time explicitly, and by learning thedifferent arm's learning curves and extrapolating them to the end of thewallclock time budget under consideration of computation already spenton each arm. The combination of learning curve extrapolation andaccounting for computation time improves the performance of multi-armedbandits in algorithm selection problems.

The empirical evaluation discussed in the following uses the 99 sets oftraces for six different base learners. The evaluation shows that evensimple approaches to learning curve fitting provide gains for regimes oftight time budgets. Overall, the best performing HAMLET variant achieveswith 95% confidence a better performance than all non-HAMLET banditsused in the experiments.

The basic form of the multi-armed bandit problem can be described asfollows: An agent is faced repeatedly with a choice among I differentactions. After each choice, the agent receives a numerical reward chosenfrom a stationary probability distribution that depends on the selectedaction. The objective is to maximize the expected total reward over sometime period or time steps. Through repeated action selections, the agentcan maximize the winnings by concentrating on the best arms. If theestimates of the action values are maintained, there is at least oneaction at any time step whose estimated value is largest, the greedyaction(s). When one of them is selected, this is called exploitation. Ifa non-greedy action is selected, it is called exploration, as it allowsto improve the estimate of the non-greedy action's value. Exploration isneeded because there is always uncertainty about the accuracy of theaction-value estimates. The greedy actions are those that look best atpresent, but some of the other actions may actually be better.

A simple exploration technique would be to behave most of the timegreedily, but with small probability ϵ, instead select randomly fromamong all the actions with equal probability, independently of theaction-value estimates. That method is called ‘ϵ-greedy’. An advantageof the ϵ-greedy is that as the number of time steps increases, thebandit will sample every action an infinite number of times. Therefore,the bandit's action value estimates will converge to the accuratevalues. Commonly, ϵ-greedy action selection forces the non-greedyactions to be tried, but indiscriminately. Another technique denoteddecaying E initializes E high and decreases E (and thus the rate ofexploration) over time.

Another effective way of selecting among the possible actions is theUpper Confidence Bound (UCB) method. It selects actions according totheir potential for being optimal. UCB does so by taking into accountboth their respective value estimates and the uncertainties in thoseestimates where actions with lower value estimates, or that have alreadybeen selected frequently will be selected with decreasing frequency overtime. UCB bandits select actions based on the upper bound of what isreasonable to assume as the highest possible true value for each action.Each time an action is selected, the epistemic uncertainty in the actionvalue estimate should decrease. On the other hand, each time anotheraction is selected, the epistemic uncertainty increases. One difficultyof UCB bandits is in dealing with non-stationary problems.

The multi-armed bandit problem presented and addressed in accordancewith embodiments of the present invention differs from the originalproblem as rewards are not stationary. When performing algorithmselection, the rewards should increase the more time is spent on thearm, whereas the rate of improvement is unknown. The objective is not tomaximize the total reward, but to find the single best reward.

Auto Tune Models (ATM) is a distributed, collaborative, scalable AutoMLsystem, which incorporates algorithm selection and hyperparametertuning. ATM approaches AutoML by iterating two steps: hyperpartitionselection followed by hyperparameter tuning. A hyperpartition includesone specific base learner, as well as its categorical hyperparameters.ATM models each hyperpartiton selection as a multi-armed bandit problem.ATM supports three bandit algorithms: the standard UCB-based algorithm(called UCB1), and two variants designed to handle drifting rewards asencountered in the AutoML setting. The variants designed for driftingrewards compute the value estimates for selecting the actions eitherbased on the velocity or the average of the best K rewards observed sofar (denoted BestK-Velocity and BestK-Rewards, respectively). Once ahyperpartition has been chosen, the remaining unspecified parameters canbe selected from a vector space using, for example, BayesianOptimization. The Machine Learning Bazaar framework for developingautomated machine learning software systems extends the work of ATM andincorporates the same bandit structures. HAMLET differs from both inmultiple ways. First, it does not choose between hyperpartitions, butsolely between base learners, and therefore does not select categoricalhyperparameters. Second, HAMLET uses a novel bandit algorithm which fitsa simple model of the learning curve to observed rewards, but selectsthe action based on an extrapolation of the learning curve to find thehighest possible reward given a time budget. Third, ATM and the MachineLearning Bazaar update the action value statistics based on completedfunction evaluations, i.e., a base learner's test performance aftertraining it on the dataset. In contrast, an embodiment of HAMLET updatestraining statistics in configurable time intervals. Even if a baselearner's tuner did not manage to find better models in a recent timeinterval, an embodiment of HAMLET tracks (the lack of) progress of thetuner's learning curve, allowing it to switch computing resourceassignments based on extrapolating learning curves.

Hyperband is a bandit-based early-stopping method for sampledparametrizations of base learners. It incorporates a bandit that dealswith the fact that arms in hyperparameter optimization might improvewhen given more training time. Hyberband builds on the concept ofsuccessive halving: it runs a set of parametrized base learners for aspecific budget, evaluates their performances, and stops the worseperforming half of the set. When presented with a larger set of possibleparametrizations of base learners, Hyperband stops parametrizations thatdo not appear promising and assigns successively more computationalresources to the promising ones that remain. HAMLET differs in that itassigns computational resources based on predicted performance and notthe past performance so far. In an embodiment of HAMLET, the bandit isused to decide which algorithm to run, and not which hyperparametersetting to run. Also, the approach to assign budget is different.Hyperband applies the concept of a geometric search to assign increasingportions of the overall budget to a decreasing number of base learnerparametrizations. In contrast, an embodiment of HAMLET continues achosen tuner for a configured time interval. After the interval, thetuner reports updates for the best found models, if any, and HAMLETupdates the respective tuner's learning curve.

The term “learning curve” is used to describe (1) the performance of aniterative machine learning algorithm as a function of its training timeor number of iterations and (2) the performance of a machine learningalgorithm as a function of the size of the dataset it has available fortraining. For the AutoML challenge addressed by an embodiment of HAMLET,the focus is on extrapolating the former type of learning curves.

It is possible to target hyperparameter optimization of Deep NeuralNetworks (DNN) using a probabilistic model to extrapolate theperformance from the first part of a learning curve for a hyperparameterconfiguration. For this, a set of parametric functions can be fit foreach hyperparameter configuration and combined into a single model byweighted linear combination. Using a Markov Chain Monte Carlo methodyields probabilistic extrapolations of the learning curve. Thoseprobabilistic extrapolations are then used for automatic earlytermination of runs with non-promising hyperparameter settings. It isalso possible to extrapolate learning curves for hyperparameter settingsand architectures of DNN. Relying on a Bayesian Neural Network (BNN) incombination with known parametric functions, samples as promisingcandidates to apply Hyperband to can be found. Predicting the modelparameters of parametric functions as well as the mixing weights withthe BNN enables transferring knowledge of previously observed learningcurves. However, that implies that previous learning curve informationis needed to pre-train the BNN for good and fast performance.

A method known as Freeze-Thaw optimization is a Gaussian process-basedBayesian optimization technique for hyperparameter search. The methodincludes a learning curve model based on a parametric exponential decaymodel. A positive definite covariance kernel is used to model theiterative optimization curves. The Freeze-Thaw method maintains a set ofpartially completed but not actively trained models and uses thelearning curve model for deciding in each iteration which ones to‘thaw’, i.e. to continue training.

A regression-based extrapolation model can be used for the extrapolationof learning curves to speed up hyperparameter optimization. Thetechnique is based on using trajectories from previous builds to makepredictions of new builds, where a ‘build’ refers to a training run witha specific base learner parametrization. Therefore, it is possible totransform data from previous builds and add a noise term to match thecurrent build and to extrapolate its performance. This extrapolationcapability can server to identify and stop hyperparameter configurationsearly.

An embodiment of HAMLET uses a less sophisticated learning curvefunction to demonstrate the general nature of benefits derived frommoving from a backward to a forward-looking multi-armed bandit foralgorithm selection. Also, an embodiment of HAMLET provides a generalapproach, not limited to a specific type of base learner such as DNN. Indemonstrating the effectiveness of an embodiment of HAMLET, previouslearning curves were not relied on. While a transfer of information fromprevious learning curves is clearly beneficial, this demonstrates thatHAMLET improves algorithm selection performance due to a simple learningcurve extrapolation, not due to a transfer and reuse of previousinformation. In an embodiment, HAMLET uses information only from thecurrent AutoML problem. Also, an embodiment of HAMLET extrapolateslearning curves for the performance of base learners' tuners and notindividual hyperparameter configurations.

Table 2 below provides a list of symbols and notations used in thefollowing to further describe possible embodiments of HAMLET.

TABLE 2 Symbol Description B overall Budget given B_(rem) remaining timebudget ϵ₁ HAMLET Variant 1: chance to pick the tuner with the secondhighest learning curve ϵ₂ HAMLET Variant 1: chance to pick a tuner atrandom ϵ(t) HAMLET Variant 2: chance to pick a tuner at random, timedependent I number of arms LC^(i) Learning Curve Function for arm i,e.g. using Eq. (1) r vector of predicted reward of all arms when budgetruns out

, {circumflex over (r)} r^(i), resp. r, after applying UCB bonus Eq. (2)ρ HAMLET Variant 3: UCB exploration bonus scaling factor Δt timeinterval for HAMLET's main loop in algorithm 1 t_(x) ^(i) execution timeof arm i spent until now r^(i) predicted reward of arm i when budgetruns out [x^(i), y^(i)] so far Learning Curve values for arm i x^(i)training time to reach y^(i) for arm i y^(i) accuracy values for arm i

According to embodiments of HAMLET, the AutoML algorithm selectionproblem is modelled as a multi-armed bandit problem, where each armrepresents a hyperparameter tuner responsible for one specific baselearner. At each iteration, HAMLET chooses which action to take, i.e.,selects the arm to pull, based on extrapolations of the arms' learningcurves as described below in and outlined in Algorithms 2 and 3. Afterdeciding on the action, the bandit continues the execution of thecorresponding hyperparameter tuner for a pre-configured time intervalΔt. When the interval elapses, execution of the arm is paused and can beresumed at that point in later iterations without loss of information.During execution of the arm in the time interval, the bandit receivesall monotonically increasing accuracy values reached in that timeinterval as well as information about when these accuracies were reached(i.e., the updated so-far learning curve). When the tuner did not findnew monotonically increasing accuracy values in that time interval, thisinformation is also incorporated. The bandit then fits a parametriccurve to match the learning curve and reduces the remaining budget byΔt. Subsequently, HAMLET continues to the next iteration. When HAMLETfaces a new AutoML problem, it tries arms preferably in a Round Robinfashion until it has collected enough values to model learning curvesfor each arm.

Learning Curve Extrapolation: According to embodiments of HAMLET,learning curves are modelled as the accuracy found by a tuner (y) overtraining time (x) in seconds. Here, training time includes the timespent on executing the tuner for identifying parametrizations of thebase learner and training the parametrized base learner on the dataset.In HAMLET, the learning curve is defined as a monotonically increasingfunction defined by the maximum accuracies found over training time. Anexample graph to demonstrate learning curve extrapolation is shown inFIG. 3 . In this example, training of the arm has been running for 500seconds, where the maximum training time of this arm is marked by avertical line. The accuracy values found up until this point are used tofit a curve, in order to predict future accuracy values over time. Fromthe accuracy scores found by this tuner, only monotonically increasingvalues are used for the learning curve, and other scores are ignored.For comparison reasons, the actual future learning curve is also shown,since this is the ground truth for the extrapolated learning curve butunknown to the HAMLET bandit. An embodiment of HAMLET targets tooptimally use the specified time budget B and find the base learnerachieving highest accuracy by devoting most computational resources toits corresponding tuner. Therefore, HAMLET attempts to predict themaximally attainable accuracy for each tuner. It does so byextrapolating each tuner's learning curve assuming that all remainingbudget B_(rem) was spent on it. Considering that each tuner i hasalready received an amount of training time time t_(x) ^(i), HAMLET usesthe tuner's learning curve to predict the accuracy at x=t_(x)^(i)+B_(rem), in FIG. 3 illustrated by a marker.

To investigate if learning curve extrapolation is a meaningful conceptfor the algorithm selection problem in constrained computationalsettings, a straightforward parametric function is used to model thetuners' observed accuracies over time. In the problem setting accordingto embodiments of HAMLET, learning curves are known to be (1)monotonically increasing, (2) saturating functions with (3) values y∈[0,1]. Because of the similar shape and possible fulfillment ofprerequisites (1)-(3), it is chosen according to an embodiment to use anarctangent function with four parameters (a, b, c, d) to translate,stretch and compress for this first set of investigations:y=a·arctan(b(x+c))+d.  (1)where scikit-learn's curve fit function are used to fit the parametersof the desired curve.

HAMLET Variants: HAMLET faces the same exploration/exploitation dilemmaas other multi-armed bandit strategies. In the following, three variantsof how HAMLET chooses the arm to be run in the next iteration aredescribed. In each, the decision is based on the values of the vector rcontaining for each arm i the accuracy predicted by extrapolating thelearning curve to time x=t_(x) ^(i)+B_(rem),r^(i). The three variantsare outlined in Algorithm 3.

HAMLET Variant 1—Double ϵ-greedy Learning Curve Extrapolation with Fixedϵ₁ and ϵ₂: In this approach, HAMLET acts in an ϵ-greedy fashion based onthe extrapolation of learning curves. After observing in preliminaryexperiments that often a subset of the tuners perform much better thanthe rest, the standard ϵ-greedy bandit was modified as follows: Withchance ϵ₂, HAMLET chooses an action at random. With chance ϵ₁, HAMLETchooses the tuner with the second highest predicted accuracy. Withchance 1−(ϵ₁+ϵ₂), HAMLET takes the greedy action, i.e., argmax(r).

HAMLET Variant 2—ϵ-greedy Learning Curve Extrapolation with Decaying ϵ:In this approach, HAMLET acts in an ϵ-greedy fashion based on theextrapolation of learning curves. The variant starts with ϵ(t)=1 andreduces it in

$\frac{B}{\Delta t}$iterative steps to ϵ(B)=0, where the notation ϵ(t) denotes thestochastic exploration parameter's time dependence. At each iteration,HAMLET chooses an action at random with chance the current ϵ(t). Withchance 1−ϵ(t), HAMLET takes the greedy action.

HAMLET Variant 3—Learning Curve Extrapolation with Exploration Bonus:This variant adds for each arm a scaled UCB-based exploration bonus tothe learning curve predictions to compute a score as follows:

$\begin{matrix}{{{\hat{r}}^{i} = {r^{i} + {\rho\sqrt{\frac{2\log n}{\log n^{i}}}}}},{\rho \geq 0},} & (2)\end{matrix}$where n is the number of total iterations, n^(i) is the number of timesarm i has been pulled and ρ is the scaling factor of the explorationbonus. At each iteration, HAMLET chooses the arm with maximum{circumflex over (r)}^(i), i.e. argmax({circumflex over (r)}).

Algorithm 2: Algorithm selection based on learning curve extrapolation.Data: Overall budget B B_(rem) = B while B_(rem) > 0 do  if Firstiteration then    for each arm i = 1, . . . , I do     [x^(i), y^(i)] =TrainAndObserveLC(i, Δt), where     [x^(i), y^(i)] describes thelearning curve values     Observed so far    end  else    NextArm =MasterChooseNextArm(r) ;    [x^(i), y^(i)] = TrainAndObserveLC(NextArm,Δt),    where [x^(i), y^(i)] describes the learning curve valuesobserved so far  end  for each arm i = 1, . . . , I do    LC^(i) =ScikitLearn.Curve_Fit(Eq. (1), [x^(i), y^(i)]);    r^(i) = LC^(i)(t_(x)^(i) + B_(rem))  end  r = [r¹, r², ... , r^(I)] ;  B_(rem) =UpdateBudget( ) ; end

Algorithm 3: MasterChooseNextArm Data: Vector r with predicted accuracyfor all arms at end of budget This includes 3 Variants ; if Variant 1then    with probability (1 − (ϵ₁ + ϵ₂)): na = argmax(r);    withprobability (ϵ₁): na = second argmax(r);    with probability (ϵ₂): na =random(1, ... , I); else if Variant 2 then    with probability (1 −ϵ(t)) : na = argmax(r);    with probability (ϵ(t)) : na = random(1, ..., I);    where ϵ(t) linearly decreases with incr. time t; else   calculate {circumflex over (r)} by applying Eq. (2) to all arms;   na = argmax {circumflex over (r)} ; end Return Nextarm = na

Traces of experiments (see Mischa Schmidt, et al., “On the Performanceof Differential Evolution for Hyperparameter Tuning,”arXiv;1904.06960v1, (Apr. 15, 2019)) which executed hyperparametertuning for six base learners by an evolutionary strategy were used andare referred to in the following discussion. Running different algorithmselection policies on the recorded experiment traces allows evaluatingdifferent bandit policies based on a ground truth. The traces are fromclassification datasets. Equation (1) above can be adjusted for otherdatasets, such as regression datasets.

Computational Resources and Setup: Each tuner (and base learner) wasexecuted in a single docker container with only a single CPU coreaccessible. Parallel execution of different experiments was limited toensure that a full CPU core was available for each docker container.There was no limit on memory resource availability. An embodiment ofHAMLET evaluated in the following executes the bandit logic also indocker containers constrained to a single CPU core. The execution waslimited different experiment runs to ensure each docker container hasaccess to one full CPU core.

Datasets, Base Learners and Hyperparameter Tuners: Referring to thetraces from the experiments used and discussed in the following,algorithm selection was performed for the 49 small (OpenML datasets:{23, 30, 36, 48, 285, 679, 683, 722, 732, 741,752, 770, 773, 795, 799,812, 821, 859, 862, 873, 894, 906, 908, 911, 912, 913, 932, 943, 971,976, 995, 1020, 1038, 1071, 1100, 1115, 1126, 1151, 1154, 1164, 1412,1452, 1471, 1488, 1500, 1535, 1600, 4135, 40475}) and 10 bigger (OpenMLdatasets: {46, 184, 293, 389, 554, 772, 917, 1049, 1120, 1128} with fiverepetitions each) datasets. Table 3 below having the verificationexperiment parameter sets documents the budgets used for the algorithmselection experiments for the small (denoted Experiment 1) and biggerdatasets (denoted Experiment 2). The recorded traces contain the sixbase learners: k-Nearest Neighbors, linear and kernel Support VectorMachine (SVM), AdaBoost, Random Forest, and Multi-Layer Perceptron.

TABLE 3 Parameter Parameter Set ϵ₁ {0.01, 0.05, 0.10, 0.20, 0.40, 0.60}ϵ₂ {0.00, 0.01, 0.05, 0.10, 0.20, 0.40} ρ {0.00, 0.05, 0.10, 0.25, 0.50,0.75, 1.00} K {3, 5, 7, 10, 20, 50, 100} B (Exp. 1) {150, 300, 450, 600,900, 1800, 3600} [s] B (Exp. 2) {900, 1800, 2700, 3600, 7200, 10800, [s]21600, 43200}

Policies and Parametrizations for Comparative Evaluation: For HAMLETvariants 1-3, time progresses in intervals of Δt=10 s with thecapability to freeze and continue the execution of different arms (e.g.,via standard process control mechanisms). For a fair comparison with theother bandit policies, the experiments take the computation time neededfor fitting the arms' learning curves into account. This work comparesHAMLET variants 1-3 with a simple Round Robin strategy (“Round Robin”),a standard UCB1 bandit (“UCB”), BestK-Rewards (“BestKReward-K”, where Krefers the parameter choice used) and BestKVelocity (“BestKVelocity-K”)policies, leveraging the BTB library. The HAMLET Variant 1 is presentedby “MasterLC-ϵ₁-ϵ₂”. HAMLET Variant 2 is referred to by “MasterLCDecay”and “MasterLC-UCB-ρ” refers to HAMLET Variant 3. For each parametrizablepolicy (BestK-Rewards, BestK-Velocity, HAMLET Variants 1 and 3), asimple grid search was ran (see Table 3) to identify the best performingparameter for that policy when considering all datasets and all budgets.This mimics a realistic setting, where the data scientist may not knowthe optimal policy parametrization beforehand and thus parametrizesbased on an educated guess.

Analysis: The highest accuracies achieved by each policy parametrizationper dataset within a given budget were compared using boxplots toidentify the most promising choices of K, ϵ₁, ϵ₂ and ρ. Afteridentifying each policy's best performing policy parametrizations acrossthe different budgets, they were compared against each other. Theintervals of 95% confidence for the policies' mean ranks in theseinter-policy comparisons were then calculated. In the Figures discussedbelow IQR stands for the inter quartile range. The interquartile rangerepresents the middle 50% of the data. It is related to quartilesbecause it is the difference between the third quartile and the firstquartile of the data. In a ranked data set, quartiles are the threevalues that divide the data set into four equal parts. Each of the fourparts contains 25% of the data. Quartile 1 is the smallest quartile. 25%of the data set is below quartile 1 and 75% of the data set is abovequartile 1 and so on. In a ranked data set, quartiles are the threevalues that divide the data set into four equal parts. Each of the fourparts contains 25% of the data. Quartile 1 is the smallest quartile. 25%of the data set is below quartile 1 and 75% of the data set is abovequartile 1 and so on. In ranking, the results are sorted such that thebest result has the lowest rank, second best result hast the secondlowest rank, and so on. The worst result has the highest rank. Theexperimental results are discussed in the following.

FIGS. 4 a, 4 b and 4 c show boxplots of HAMLET Variant 1 ranks forExperiment 1 and confirm that substantial levels of constantstochasticity, as well as too small levels of stochastic exploration,are detrimental for the performance of Variant 1. ϵ₁=0.1 and ϵ₂=0.1 wereselected among the parametrizations performing equally for comparingwith other policies. With smaller budgets the results do not changequalitatively.

FIGS. 5 a, 5 b, 5 c and 5 d show box plots of HAMLET Variant 3 ranks forExperiment 1 and confirm that medium to large ρ for scaling the UCBexploration bonus is detrimental for the performance of Variant 3, as isdeactivating the UCB bonus altogether. ρ=0.05 was selected for comparingwith other policies. With smaller budgets the results do not changequalitatively.

FIGS. 6 a, 6 b and 6 c show selected boxplots of ranks for smalldatasets for different budgets for inter-policy comparisons. Inparticular, it can be seen that HAMLET Variants 1 and 3 achievefavorable performances over all experiments.

Similar to FIGS. 4 a, 4 b and 4 c , FIGS. 7 a, 7 b and 7 c confirm thathigh levels of constant stochasticity, as well as too small levels ofstochastic exploration, are detrimental for the performance ofVariant 1. The selection of ϵ₁=0.1 and ϵ₂=0.1 was confirmed among theparametrizations.

Similar to FIGS. 5 a, 5 b, 5 c and 5 d , FIGS. 8 a, 8 b, 8 c and 8 dconfirm that medium to large ρ for scaling the UCB exploration bonus isdetrimental for the performance of Variant 3. ρ=0.05 was confirmed.

FIGS. 9 a, 9 b, 9 c, 9 d, 9 e, 9 f, 9 g and 9 h show boxplots of ranksfor bigger datasets for all eight budgets. HAMLET Variant 1(MasterLC-ϵ₁-ϵ₂) and Variant 3 (MasterLC-UCB-I) achieve favorableperformance. At higher budgets, BestKRewards-7 and UCB can catch up.

HAMLET Variants 1 and 3 benefit from time awareness and learning curveextrapolation capability. Extrapolation is beneficially encouraged, buttempered since too high levels of stochasticity or exploration bonusreduce the algorithm selection performance (see FIGS. 4 a-c, 5 a-d, 7a-c and 8 a-d ). For several low to medium budgets in each experiment,HAMLET Variants 1 and 3 perform better than the competitor policies. Forbudgets outside of that range, HAMLET Variants 1 and 3 perform at leaston par with the BestK-Reward policy.

The boxplots in FIGS. 6 a-c and in FIGS. 9 a-h indicate that HAMLETvariants perform better than the compared-to policies for a range ofbudgets. FIG. 10 shows the aggregation of all 1,485 runs (99 traces×15budget levels). Here, HAMLET Variant 3 learning curve extrapolationcombined with an uncertainty bonus for exploration in particularachieves a statistically significant better performance at the 95% levelthan the other policies (except the HAMLET Variant 1), because therespective confidence intervals (CI) do not overlap. Therefore, it canbe concluded from the experiments, and in particular, the performance ofHAMLET Variant 3, that the inventors' discovery that the combination oflearning curve extrapolation and accounting for computation timeimproves the performance of multi-armed bandits in algorithm selectionproblems is indeed correct. Considering that the applied learning curveextrapolation technique is straightforward (see Equation (1) above),more sophisticated learning curve techniques can also be applied, whichwould only increase the relative advantage of HAMLET over alternativepolicy approaches.

It was also verified that the trends of the results do not change whenthe different policy groups' best-performing policies are comparedagainst each other for each budget. Finally, it was observed during theexperiments that the BestK-Velocity policy usually performs much worsethan the BestK-Rewards strategy or the UCB strategy.

In sum, the experiments with a range of bandit policy parametrizationsshow that even a straightforward approach to extrapolate learning curvesis a valuable amendment for the bandit-based algorithm selectionproblem. Statistical analysis shows that the HAMLET Variants are atleast as good as standard bandit approaches, while providing a number ofimprovements and advantages as detailed herein. Notably, HAMLET Variant3, which combines learning curve extrapolation with a scaled UCBexploration bonus, performs superior to all non-HAMLET Variants, asshown in FIG. 10 illustrating the intervals of 95% confidence of thedifferent policies' mean ranks, aggregated over all datasets andbudgets. Even further performance improvements can be achieved, forexample, using more sophisticated learning curve modeling approaches,e.g., BNN-based learning curve predictors, or by integratingmeta-learning concepts, e.g., by evolving HAMLET into a contextualbandit.

While embodiments of the invention have been illustrated and describedin detail in the drawings and foregoing description, such illustrationand description are to be considered illustrative or exemplary and notrestrictive. It will be understood that changes and modifications may bemade by those of ordinary skill within the scope of the followingclaims. In particular, the present invention covers further embodimentswith any combination of features from different embodiments describedabove and below. Additionally, statements made herein characterizing theinvention refer to an embodiment of the invention and not necessarilyall embodiments.

The terms used in the claims should be construed to have the broadestreasonable interpretation consistent with the foregoing description. Forexample, the use of the article “a” or “the” in introducing an elementshould not be interpreted as being exclusive of a plurality of elements.Likewise, the recitation of “or” should be interpreted as beinginclusive, such that the recitation of “A or B” is not exclusive of “Aand B,” unless it is clear from the context or the foregoing descriptionthat only one of A and B is intended. Further, the recitation of “atleast one of A, B and C” should be interpreted as one or more of a groupof elements consisting of A, B and C, and should not be interpreted asrequiring at least one of each of the listed elements A, B and C,regardless of whether A, B and C are related as categories or otherwise.Moreover, the recitation of “A, B and/or C” or “at least one of A, B orC” should be interpreted as including any singular entity from thelisted elements, e.g., A, any subset from the listed elements, e.g., Aand B, or the entire list of elements A, B and C.

What is claimed is:
 1. A method for automatically selecting a machinelearning algorithm and tuning hyperparameters of the machine learningalgorithm, the method comprising: receiving a dataset and a machinelearning task from a user; controlling execution of a plurality ofinstantiations of different automated machine learning frameworks on themachine learning task each as a separate arm in consideration ofavailable computational resources and time budget, whereby, during theexecution by the separate arms, a plurality of machine learning modelsare trained and performance scores of the plurality of trained modelsare computed; and selecting one or more of the plurality of trainedmodels for the machine learning task based on the performance scores. 2.The method according to claim 1, wherein the performance scores areextrapolated for a remainder of the time budget based on achievedperformances of respective ones of the arms during a time interval ofthe execution which is a portion of the time budget.
 3. The methodaccording to claim 2, further comprising assigning the computationalresources to the arms during the remainder of the time budget based onthe extrapolated performance scores.
 4. The method according to claim 2,wherein the performance scores are extrapolated by fitting a learningcurve function to past rewards of the respective ones of the arms andextrapolating the past rewards until an end of the remainder of the timebudget.
 5. The method according to claim 2, further comprising freezingthe execution of at least one of the arms based on the extrapolatedperformance scores.
 6. The method according to claim 5, furthercomprising resuming the execution of the at least one of the arms from apoint at which the freezing occurred.
 7. The method according to claim1, wherein at least some of the arms are executed by time multiplexingusing a selection mechanism to allocate the computational resources tothe arms during the time budget.
 8. The method according to claim 1,wherein at least some of the arms are executed in parallel.
 9. Themethod according to claim 1, further comprising building an ensemblefrom the plurality of trained models.
 10. The method according to claim1, wherein each of the arms are executed as a microservice component ofa cloud computer system architecture in a docker container which has acontainer image for a respective one of the automated machine learningframeworks.
 11. The method according to claim 10, wherein the dockercontainers are contained within a larger docker container, whichcontains separate docker containers for components which control theexecution of the arms.
 12. The method according to claim 1, furthercomprising constructing a learning curve for each of the arms during atime interval of the execution within the time budget, extrapolatingperformance scores of each of the arms until a remainder of the timebudget, and freezing or disabling execution of at least some of the armsbased on the extrapolated performance scores.
 13. The method accordingto claim 12, wherein the learning curves are constructed based onmaximum performance scores achieved by respective ones of the armsduring the time interval.
 14. A microservice component encapsulated in adocker container of a cloud computing system architecture comprising oneor more processors which, alone or in combination, are configured toprovide for execution of a method comprising: controlling execution of aplurality of instantiations of different automated machine learningframeworks on a machine learning task each as a separate arm inconsideration of available computational resources and time budget,whereby, during the execution by the separate arms, a plurality ofmachine learning models are trained and performance scores of theplurality of trained models are computed.
 15. A tangible, non-transitorycomputer-readable medium having instructions thereon which, upon beingexecuted by one or more processors, alone or in combination, provide forexecution of a method comprising: controlling execution of a pluralityof instantiations of different automated machine learning frameworks ona machine learning task each as a separate arm in consideration ofavailable computational resources and time budget, whereby, during theexecution by the separate arms, a plurality of machine learning modelsare trained and performance scores of the plurality of trained modelsare computed.